Table of Contents
Preamble
The nomenclature guidelines below explain how FlyBase assigns canonical symbols and
names to its genetic objects (genes, alleles, transposons, insertions, aberrations and
balancers). We encourage the community and journals to adhere to FlyBase-approved
symbols/names for consistency in published datasets. While these guidelines cover
most circumstances, there may be exceptional cases not clearly covered here. Please
contact FlyBase to discuss such cases or any
other aspect of the nomenclature.
1. Policy for establishing FlyBase-approved gene symbols and names
1.1. Justification for unique approved
symbols/names. It is of great value to the research community that there is a
single officially sanctioned (approved) symbol and name for each gene in FlyBase. Use
of unique symbols/names, together with corresponding unique identifiers (e.g., FBgn
numbers) minimizes ambiguity in referring to these genes in the scientific literature.
1.2. Assigning approved symbols/names. It
is inevitable that multiple synonyms for a gene arise in the literature, typically as
a result of publications on the same gene by multiple laboratories or the realization
that genes previously thought to be independent are actually part of the same genetic
unit. In such cases, FlyBase adheres to the following rules for establishing or
changing the approved gene symbol/name.
1.2.1. Chronological precedence. Approved gene
symbols/names are normally established by the earliest date of publication of the
proposed symbol/name in a peer-reviewed primary research paper. (No other form of
publication is relevant to chronological precedence.)
1.2.2. Community usage. Chronological precedence can be
overridden in favor of an alternative gene symbol/name that is clearly favored by the
research community. This can be on a gene-by-gene basis or to rationalize the
nomenclature for an entire gene family or other functional grouping.
1.2.3. Placeholders. Certain classes of generic gene
symbols/names are placeholders (see sections 2.3.1 and 2.4) and are subject to
replacement by a more meaningful symbol/name according to the rules of 1.2.1, 1.2.2
and 1.2.4. However, generic symbols/names based on a phenotype shall be retained by
FlyBase if they are re-used by the first peer-reviewed research paper to characterize
that gene and/or are clearly favored by the research community.
1.2.4. Validity criteria. Authors' preferred symbols/names
will be used as the FlyBase-approved gene symbols/names whenever possible. However,
the validity criteria set out in section 2.2 must be adhered to, and FlyBase will
modify authors' preferred gene symbols/names where necessary.
2. Gene symbols and names
2.1. Symbols versus names. The gene symbol is typically an abbreviation of the full gene name
and as such, should ordinarily consist of a minimal number of characters. The gene
symbol and name should use comparable capitalization and character sets.
2.2. Requirements of FlyBase-approved Drosophila gene symbols and names.
2.2.1. Uniqueness. Each approved gene symbol and name must be unique amongst all FlyBase-approved symbols and names.
2.2.2. Relevance. The name should allude to the gene's function, mutant phenotype or other relevant characteristic.
2.2.3. Restricted and non-permissible characters. There are several characters which have specific meanings in a genotype string. Use of these characters in a gene symbol would complicate interpretation of genotypes. Therefore, approved gene symbols shall adhere to the following rules:
2.2.3.1. Approved symbols shall not contain the following characters: /, \, {, }, <, >, [, ], ;, *.
2.2.3.2. Approved symbols shall not contain spaces. Where a separator is needed to keep characters from losing meaning by running together, a hyphen "-" should be used.
2.2.3.3. Approved symbols shall not contain letters from any character sets other than English or Greek.
2.2.3.4. Colons ":" shall only be used in the approved symbols of certain classes of non-protein-coding genes, genes encoded in the mitochondrial genome, and synthetic fusion genes, as described in section 2.6.
2.2.3.5. Round brackets "( )" shall only be used in certain classes of approved gene symbols as separators to designate a chromosome or an allele whose phenotype is modified by the gene in question.
2.2.4. Capitalization. By historical tradition, gene symbols/names begin with a lowercase letter if the gene is named for the phenotype of a recessive mutant allele. Gene symbols/names begin with an uppercase letter if they are named for the phenotype of a dominant mutant allele, or if they are named for some aspect of the wild-type molecular function or activity. Subsequent letters are normally lowercase.
2.2.5. Superscripts and subscripts. Gene symbols and names
should not normally contain superscripts or subscripts. The only exception is when an
allele name is an integral part of a gene symbol or name, e.g., su(wa).
2.2.6. Italicization. All gene symbols and names should be italicized.
2.2.7. Genus/species prefixes. Genes from all species, except D. melanogaster, automatically get a unique species abbreviation prefix appended to their FlyBase-approved symbol (see section 2.5.1). Any different/additional indication of a gene's origin (e.g. D, Dro or Dm) is redundant and/or ambiguous and will not form part of the FlyBase-approved gene symbol/name.
2.2.8. Symbols and names must be inoffensive.
2.3. Common prefixes.
2.3.1 Prefixes based on phenotype, EST or STS. Several generic gene symbol/name prefixes have been used for genes sharing a common mutant phenotype or originally identified by virtue of an EST or STS. A non-exhaustive list is shown below:
Berkeley Drosophila Genome Project EST cluster-based gene
BEST:
European Drosophila Genome Project STS-based gene
ESTS:
female sterile
fs(n)m, Fs(n)m
male sterile
ms(n)m, Ms(n)m
male & female sterile
mfs(n)m, Mfs(n)m
maternal
mat(n)m, Mat(n)m
mitotic mutant
mit(n)m, Mit(n)m
NIDDK EST Project-based gene
NEST:
resistance
rst(n)m, Rst(n)m
suppressor
su(a)m, Su(a)m
* n designates the chromosome, m a distinguishing symbol, and a a gene whose phenotype is modified by an enhancer or suppressor
Gene symbols/names using these generic prefixes are placeholders and are subject to replacement by a more meaningful symbol/name according to the rules set out in sections 1.2.1 and 1.2.4.
2.3.2. Prefixes based on common molecular function. Genes
encoding products of similar molecular function may be given symbols/names with
identical prefixes and unique suffixes. This is to be encouraged and FlyBase will
rationalize the nomenclature for an entire gene family or other functional grouping if
favored by the research community. Historically, the unique suffix may refer to a
gene's cytological location (e.g. Actin-5C, Actin-42A, Actin-57B etc). More recently,
the unique suffix may simply be an incremental numerical value (e.g. Sdic1, Sdic2,
Sdic3 etc.), or reflect some other distinguishing feature, such as orthology with a
reference data set (e.g. RpL3, RpL4, RpL5 etc.). Also see section 2.6.
2.4. Annotation IDs. Gene annotation IDs, which are distinct from gene symbols, exist for all molecularly defined gene models in the 12 sequenced species of Drosophila.
2.4.1. Format. Annotation IDs are represented in a common
way: a species-specific 2 letter prefix followed by a four or five digit integer. For
historical reasons, there are two 2-letter prefixes for D. melanogaster: CG for
protein-coding genes and CR for non-protein-coding-genes. For all other species, there
is a single two-letter code to be used for gene models, regardless of which class of
gene they identify.
CG, CR
Drosophila melanogaster
GA
Drosophila pseudoobscura pseudoobscura
2.4.2. Use as approved gene symbols. In the absence of
other information, the annotation ID is used as a placeholder for the gene symbol
(while the gene name field is left blank) and is subject to replacement by a more
meaningful symbol/name according to the rules set out in sections 1.2.1, 1.2.2 and
1.2.4.
2.5. Approved gene symbols/names for non-D. melanogaster genes. FlyBase includes genes from all species of Drosophilidae plus genes from other species that have been introduced into Drosophila.
2.5.1. Species abbreviation prefixes. For species other than
Drosophila melanogaster, the FlyBase-approved gene symbol follows a species
abbreviation indicating the species of origin. The prefix has the form 'Nnnn\', where
N is the initial letter of the genus and nnn is a unique code for a given species of
that genus, usually the first three letters of the species name. (For example, Dsim
is the species abbreviation for Drosophila simulans.)
A complete list of valid abbreviations is available on the species abbreviations page.
By convention, a 'Dmel' prefix is not
used for D. melanogaster gene symbols in FlyBase (unless this is important in
context). Gene names are not prefixed with species information.
2.5.2 Approved gene symbols/names. The FlyBase-approved gene
symbols/names may correspond to the meaningful symbol/name of the D. melanogaster orthologs, distinguished by the relevant species
prefix (as described in 2.5.1). (It should be noted that the assignment of orthology
can be problematic in the absence of whole genome sequence information.) D.
melanogaster gene symbols/names that are defined as placeholders (see sections 2.3.1
and 2.4) or contain D. melanogaster-specific cytological information should not be
used as the symbols/names of orthologs in other species.
2.6 Special cases.
2.6.1. rRNA genes. Genes encoding ribosomal RNAs have
symbols of the format 'nSrRNA', where n denotes the respective rRNA's sedimentation
rate in Svedberg units, e.g., 28SrRNA. By historical convention, the locus containing
the genes encoding the 5.8SrRNA, 18SrRNA and 28SrRNA is called bobbed (bb).
2.6.2. tRNA genes. Genes encoding transfer RNAs have
symbols of the format 'tRNA:Xn:m', where X is the 1-letter amino-acid code (in
upper-case); n is a number signifying the particular isoform; m is the cytogenetic map
position of the gene; and a (if used) is a lower-case letter to distinguish between
functionally similar tRNA genes mapping to the same location, e.g., tRNA:S7:23Ea.
2.6.3. snRNA genes. Genes encoding small nuclear RNAs have
symbols of the format 'snRNA:XX:ma', where XX is the type of snRNA; m is the
cytogenetic map position of the gene; and a (if used) is a lower-case letter to
distinguish functionally similar snRNA genes mapping to the same location, e.g.,
snRNA:U6:96Aa.
2.6.4. snoRNA genes. Genes encoding small nucleolar RNAs
have symbols of the format 'snoRNA:X'. X usually represents the type of modification
catalyzed and/or the substrate, e.g. snoRNA:MeU2-C28, which encodes a snoRNA that
guides methylation of nucleotide C28 of the U2 snRNA; or snoRNA:Ψ28S-612, which
encodes a snoRNA that guides pseudouridylation of nucleotide 612 of the 28S rRNA. If
the substrate is unknown, then 'Or' is used in the symbol to indicate that it encodes
an 'Orphan' snoRNA. A suffix is used where necessary to distinguish functionally
similar snoRNA genes, e.g., snoRNA:Me18S-G1358b, or snoRNA:U3:9B (where the suffix is
based on cytogenetic position).
2.6.5. miRNA genes. Genes encoding microRNAs have symbols
of the format 'mir-N', where N is simply a sequential
number according to the conventions outlined in Ambros, Bartel, et. al. 2003 e.g., mir-125.
2.6.6. Pseudogenes. Pseudogenes have symbols of the format
symbol_of_parental_gene-psX, where X (if used) is a number to distinguish between
multiple pseudogene copies of a particular parental gene. If only one pseudogene copy
of a particular gene has been found, it should be given the suffix -ps1.
2.6.7 Mitochondrial genes. Genes encoded by the mitochondrial genome have symbols prefixed with 'mt:', e.g., mt:ND4.
2.6.8 Ribosomal protein genes. Genes encoding ribosomal
proteins are named based on orthology to their mammalian counterparts. Genes encoding
cytoplasmic ribosomal proteins have symbols of the format 'RpSn' or 'RpLn', where S
denotes a gene encoding a protein of the small subunit and L a gene encoding a protein
of the large subunit, and n is a number reflecting orthology, e.g., RpL3, RpS6. Genes
encoding mitochondrial ribosomal proteins have symbols of a similar format and are
prefixed with 'm', e.g., mRpL1, mRpS2. Some ribosomal proteins are encoded by
duplicate genes; these are distinguished by using a a or b suffix, e.g., RpS14a and
RpS14b. Some ribosomal protein genes were originally named after a mutant phenotype,
e.g. sop or tko; these have been retained as the approved gene symbols/names in
FlyBase.
3.
Allele symbols and names
3.1. Superscripts.
Alleles at a particular gene are designated by the same name and symbol and
are differentiated by distinguishing superscripts. In written text the allele
designation may be separated from that of the gene by a hyphen, e.g., white-apricot.
3.2. Symbols.
Allele symbols should be short, preferably no more than three characters long,
and cannot contain spaces, superscripts, or subscripts. Whenever possible superscript
characters should be limited to the following set:
a-z A-Z 0-9 - + : .
The + symbol is reserved for the
wild-type allele. Consecutive allele numbers should be used wherever possible.
Greek characters may be used but
are discouraged.
The character \ is reserved in all
gene symbol contexts for species identification.
The character / is reserved as a
homologue separator in genotypes and cannot be used in allele symbols.
In text in which superscripting is
not possible, such as ASCII files, superscripted text should be enclosed between
the characters [ and ].
FlyBase makes exceptions to the brevity
rule when recording in vitro mutagenesis constructs that are represented
with alleles. Where these are not otherwise named FlyBase confers symbols according
to a system including the initial of the last name of the first author of the
first paper in which the allele was initially reported ('I' in the following
examples). The most frequently used classes include:
cIa
for 'construct a of Author-lastname'
Scer\UAS.cIa
for 'S. cerevisiae UAS construct a of Author-lastname'
tIa
for 'transgene a of Author-lastname'
mIa
for 'minigene a of Author-lastname'
hs.PI
for 'heat shock construct of Author-lastname'
gene_symbol.PI
for 'gene promoter fusion of Author-lastname'
In addition, exceptions have been
required for some large series of alleles and collections of mutations. Nevertheless,
brevity of allele symbols is very much to be encouraged.
3.2.1 It is unacceptable
to use, as a superscripted allele symbol, elements of the genotype in which
the allele arose, since such a designation implies something more than a trivial
connection between allele and element. Alleles that are revertants of a pre-existing
allele are an exception to this rule.
3.2.2. While historically,
the numeral 1 has been the implied superscript of nonsuperscripted
symbols, this practice has created considerable ambiguity and is now discouraged.
As with all other alleles, the numeral 1 should be explicitly designated
(e.g., sc1, not
sc).
3.2.3. For a recessive
allele of a gene named as a dominant, or a dominant allele of a gene named as
a recessive, the superscripts r and D, respectively, may be
used; e.g., Hnr,
Hnr2, and ciD.
3.2.4. For a wild-type
allele, a superscripted plus character may be used; e.g., b+
or B+. The plus
symbol alone implies the normal (wild-type) allele or alleles in any context,
such as y1/+.
It may be necessary to distinguish
among more than one 'wild-type' allele. In such cases the different wild-type
alleles should be given a distinguishing number, which would follow the + character
in the superscript, e.g., ry+3.
3.2.5. Absence of
a particular locus may informally be noted by use of a superscript minus character
with the symbol; e.g., bb-. This is not acceptable as a
designation of a particular allele.
3.2.6. Revertants
or partial revertants of mutant alleles are designated by the superscript rv
followed by a distinguishing number; these are placed after the allele designator,
e.g., D4rv32,
the 32nd revertant of D4.
Revertants of dominant mutations that are deficiencies are treated not as alleles
but as deficiencies and are accordingly not superscripted but listed with the
distinguishing number, e.g., Df(2L)Scorv4.
3.2.7. Alleles specifying
the absence of a particular enzyme or other protein are designated by the superscript
n (null) followed by a distinguishing number or letter, e.g., Adhn1,
or, where lack of function is inviable, by l (lethal), followed by
a distinguishing number, e.g., Nrgl2.
3.2.8. An allele
known to be mutant but whose specific identity is unknown is given an asterisk
as an allele designation, e.g., w*.
4.
Transposons and Transgene Constructs
Transposons or transgene constructs
integrated into the Drosophila genome, if they cause a mutant phenotype, are
both alleles and aberrations (similar to other classes of aberrations that are
associated with mutant phenotypes). Where such insertions produce no mutant
phenotype, they are named purely according to aberration conventions. Where
transposon/transgene insertions produce a mutant phenotype by disrupting an
endogenous gene, they are given names both as an allele of the mutated endogenous
gene and as an aberration. The name of the allele follows conventions outlined
in section 2. Rules for naming natural transposons and transgene constructs
and their insertion into the genome follow.
Generic naturally occurring transposons
are symbolized as ends{}, where ends stands for the symbol of
a given transposon, such as P for P-element. Doc{}, copia{}
and P{} are examples. A defined natural variant of the transposon family
can be named by including a symbol for that name inside the brackets. A specific
insertion of a given transposon is described by including an additional unique
symbol following the brackets.
Insertions of natural transposons annotated as genome sequence features also have synonyms of the form TEnnnnn, for example, copia{}910 has the synonym TE20021.
Symbols for constructed transposons,
or transgene constructs, must always include a construct symbol, which defines
a particular construct. A full transgene construct genotype
consists of the source of transposon ends, included genes, construct symbol,
and insertion identifier, in the form ends{genes=construct-symbol}. Once
defined, ends{construct-symbol} (or less formally, construct-symbol
alone) can be used in most circumstances to refer to a specific transgene
construct. The symbol for a specific insertion of a given transgene
construct has the form ends{construct-symbol}insertion-identifier.
Further details are given in the sections that follow.
Some examples:
- P{w+mC
ovoD1-18=ovoD1-18}
- the full genotype of the P-element
transgene construct P{ovoD1-18}
- P{ovoD1-18}13X6
- a viable insertion of the construct
P{ovoD1-18}
- P{Scer\GAL4wB
w+mW.hs Ecol\ampR Ecol\ori=GawB}
- the full genotype of the transgene
construct P{GawB}
- P{GawB}h1J3
- an insertion of the construct
P{GawB} that disrupts
the h gene
- H{w+mC
Ecol\ori Tn\kanR Ecol\lacZHZ50a=Lw2}
- the full genotype of the hobo
transgene construct H{Lw2}
- H{Lw2}dpp151H
- an insertion of the transgene
construct H{Lw2} that
disrupts the dpp gene
This nomenclature is formally
similar to that used for aberrations, where the ends{symbol} prefix
is similar to the Df(n), Dp(n;m), etc., prefixes of aberrations,
and the identifier suffix is similar to the gene-allele suffix
of aberrations with associated alleles, or the alphanumeric string suffix
of other aberrations. Specific rules for assembling the components of a transgene
construct genotype follow.
4.1. Transposon
ends. Pairs of terminal repeats which together form a transposon
are symbolized by opposing braces, {}. The source of the transposon ends is
indicated outside the braces, at the left end of the string by a symbol derived
from the name of the transposon family:
4.1.1.
Isolated terminal repeats are indicated with the family symbol followed by 3'
or 5', e.g., P5' represents the isolated 5' end of a P{} transposon.
4.1.2. Multiple
sets of matched transposon ends are indicated by nesting ends{} symbols,
e.g., P{I{neo[RT]W[+]}}.
A P transgene construct containing
ry+t7.2 and an isolated hobo terminal repeat
from the 5' end of a hobo element would be described as P{ry+t7.2
H5'}.
Formally, this system can be extended
to any insertion of mobile DNA, for example, the copia, gypsy and FB
elements. Thus, the ctMR2
mutation, caused by the insertion of a gypsy element, is called gypsy{}ctMR2.
When a mobile element inserts into a mutant gene already carrying a mobile element,
it is the new insertion that is named. For example, a jockey insertion
into ctMR2 generates
ctMRpD, this
is called jockey{}ctMRpD. The name describes the new insertion
which has caused the new phenotype. A full genotype description, including all
sets of transposable element ends, is only provided when the progenitor allele
is also fully described.
FlyBase uses this nomenclature not
only because of its rigor, but also because its more general use may be needed
if such elements are engineered.
4.2. Included
genes. A full transgene construct description lists within the
braces all functional genes, including non-Drosophila genes such as antibiotic
resistance genes, bacterial and phage origins of replication, and the FLP1
recombination target (FRT), separated by spaces. The left-right order
of these elements reflect their 5' to 3' order (with respect to the transposon
ends) within the construct. If the order of a gene is unknown, it is placed
at one end of the list, followed or preceded by a comma.
4.2.1. Drosophila melanogaster
genes. Valid gene symbols are used to name D. melanogaster
genes. Wild-type alleles of intact genes are indicated by a superscripted '+t'
followed by an identifier, e.g., ry+t7.2
or Adh+t3.2.
A convenient identifier (used in these examples) is the size of the genomic
fragment carrying the wild-type gene. Transgene-construct-borne genes that do
not confer wild-type function are given unique allele designations without the
preceding '+t', e.g., ftzB
or yD225. Replacement
of promoter or other control sequences can be indicated in the allele designation:
dpphs.PP, e.g.,
for a dpp gene controlled
by a heat shock promoter.
4.2.2. Species of origin.
Species of origin is indicated for non-melanogaster Drosophila genes
present in transgene constructs. A species code composed of the first letter
of the genus (capitalized) and a three letter code, usually the first three
letters of the species (lower case) is added to the gene symbol with a separating
backslash, e.g., Dvir\Dfd+t7.6 for the wild-type Deformed
gene from Drosophila virilis (see paragraph 2.2.7.).
For genes from species other than
those of Drosophila the valid gene symbols are used following a four-letter
symbol, as above, indicating the species of origin, e.g., Hsap, for
humans, Gdom, for chicken, Hsim, for Herpes simplex,
Ecol for E. coli etc. For viruses, the name or abbreviation,
e.g., Abelson, Adeno5, Cmeg, or symbolic name, e.g., T4, M13,
the greek symbol lambda, is sometimes used instead of a genus-species-derived
four-letter symbol. In all cases, these symbols are separated from the gene
symbol by a backslash \. A file of these species
abbreviations is available on FlyBase.
FlyBase considers transposable elements,
the mitochondrial DNA and other similar entities to be species (this is because
each can contain several different genes). It is for this reason that, for example,
the P-element Transposase has the symbol P\T in constructs.
4.2.3. Fusion genes. Fusion
genes are defined (by FlyBase) as the fusion of protein coding regions of distinct
genes constructed by in vitro mutagenesis. They are named using the
gene symbols of their component parts, separated by a double colon, e.g., Antp::Scr
or Act88F::Scer\act1 .
The order of gene symbols stated
in the fusion gene will be alphabetical. The complexity of these constructs
is such that were each to be named according to its molecular composition, for
example in the 5' to 3' direction, the number of named fusion genes would rapidly
become impractical.
An exception to the 'alphabetical
order' rule will be made for cases where the fusion is between a D. melanogaster
and a non- melanogaster gene. In such cases the melanogaster
gene symbol will be stated first, e.g., tra2::Hsap\SFRS2.
For historic reasons, some promoter
fusions involving reporter genes such as Ecol\lacZ,
though technically protein fusions, are simply treated as alleles of Ecol\lacZ.
The symbol for the additional gene(s) contributing to the fusion indicated as
part of a superscript, e.g., Ecol\lacZP\T.A92.
In these special cases there is no distinction made between promoter fusions
and protein fusions in the gene name.
4.2.4. Modified genes. Modified
genes, cDNAs and in vitro mutagenized sequences are treated as alleles,
and will be curated by FlyBase as such. They should be named, therefore, by
the same conventions used to name classical alleles. The following allele symbols
have been assigned by FlyBase to the commonly used modified genes of D.
melanogaster:
- w+mC
- The mini-white gene constructed
by Pirrotta (1988) by deleting the Hin dIII- Xba I fragment
from the long 5'-intron of the w+ gene. Carried by Casper
plasmids and their derivatives.
- w+mW.hs
- The mini-white gene constructed
by Klemenz et al. (1987). Carried by the W6, W8 family of plasmids
and their derivatives.
Genes modified by the addition of a tag allowing the product to be identified,
marked or purified represents a special class of modified genes. Tags are used
to mark a transcript, e.g., with a piece of M13 DNA allowing the transcript
to be identified by in situ hybridization. Tags are also be used to
mark a protein, for purposes of purification (e.g., (His)6), for
purposes of identification (epitope tags) or for purposes of targeting to a
cellular compartment (nls tags).
FlyBase considers as tags constructs designed for these purposes and curates
these modified genes as alleles of the tagged gene. Tagged genes have symbols
with the format 'T:y' where T stands for Tag and y
is the species\gene symbol of the tag, e.g., T:Hsap\Myc,
T:Ivir\HA1, T:Hsap\p53,
T:Zzzz\His6 (the Zzzz 'species'
prefix is used when the tag is artificial).
A complete list of tagged gene symbols
and their definitions is available from FlyBase through QuickSearch. Change the 'Species' option from the default 'Dmel' to 'All species'. Ensure the 'Search' option is set as 'ID/Symbol/Name' and 'genes' is selected as the 'Data Class'.
Type 'T:*' (don't use the quotation marks) in the 'Enter text'
field and submit the query.
4.3. Construct
symbol. Every construct must be assigned a symbol which, in conjunction
with the description of the terminal repeats, uniquely describes a transgene
construct, for example, P{lacW},
H{PDelta2-3}. Symbols must
be unique, but should be kept as short as possible.
4.3.1. Full genotype.
In the full genotype of a transgene construct, the construct symbol is the final
entry within the braces, separated from the final gene symbol by the equal sign,
e.g., P{lacZP\T.W w+mC
ampR ori=lacW} is the full genotype of P{lacW}.
4.3.2. Short form and partial
genotypes. Once defined, a transgene construct can be referred to by
either the transgene symbol, e.g., P{lacW}
(or, less formally, lacW),
or the symbol plus insertion identifier (see below) in most contexts. Additional
components can be added as needed for clarity. For example, in stock genotypes
it is preferable to include the visible markers, as in P{w+mC=lacW}thj5C8
or P{w+t11.7 ry+t7.2=
wA}3-1, to avoid misunderstandings about the expected phenotypes of
the flies.
4.4. Insertion
identifier. The right-most position of the transgene symbol, outside
the outer-most bracket, is reserved for a string that identifies a specific
insertion into the genome of the defined construct. There are four cases to
consider for naming insertions.
4.4.1. Insertion hits a known
gene. When a mutant phenotype associated with a transgene construct
insertion is assigned to a known gene, the insertion-induced allele should be
named by the normal rules. Since such insertions cause new alleles, the gene-allele
description is used as the identifier of the associated insertion (just as with
other alleles identified as aberrations). For example, a P{lacW}
insertion referred to as l(2)k05007 and then shown to be an allele
of CycE becomes P{lacW}CycEk05007.
Insertion-induced alleles in stock genotypes should include the aberration name
of the construct, i.e., P{lacW}CycEk05007.
In most other circumstances the insertion aberration prefix can be dropped and
the mutation referred to in the usual way, in this case, CycEk05007.
4.4.2. Insertion defines
a new gene. Often insertions cause a phenotype that cannot be associated
with any known gene. In that case the insertion defines the first allele of
a new gene, which is named by the normal rules, e.g., P{lacW}Trf1.
4.4.3. A mapped insertion
with no phenotype. If an insertion has no phenotype but is mapped to
the polytene chromosomes, then it is preferable to use the polytene chromosome
subdivision to which it maps as its identifier, e.g., P{bw+L}60B.
If a similar construct already has this name then that of the new one would
be P{bw+L}60B-2 or similar.
If the insertion is not mapped then
there is no alternative but to give the insertion an arbitrary number or code,
e.g., P{A92}A45. This symbol
must be unique and as simple as possible using only characters from the set:
a-z A-Z 0-9 -
5.
Cytogenetic descriptions
Breakpoints should be according to
the revised salivary gland chromosome maps published by C. B. and P. N. Bridges
(see Lindsley and Zimm, 1992), except
for chromosome 4, where the map
of Sorsa (Chromosome maps of
Drosophila Vol. II, CRC Press, 1988) should be used.
5.1. Range
designations. For the location of a single object (breakpoint of
aberration, gene position, site of transposon insertion, etc.) the range is
given as "(d1)(S1)(b1)-(d2)(S2)(b2)", where:
d
=
numbered division (1 to 102)
S
=
lettered subdivision (A to F)
b
=
band number (1 to n, depending upon the particular subdivision)
For ranges not known to the accuracy
of a band, see paragraph 5.5.
If the range encompasses two different
numbered divisions (i.e., d1 does not equal d2), then the full designations
for both the left end and the right end of the range will be used, e.g., 32A3-33A2.
If the range is within a single numbered
division (i.e., d1=d2) but within different subdivisions (i.e.,
S1 does not equal S2), then the numbered division designation is not repeated
to the right of the hyphen, e.g., 32A3-D4.
If the range is within both the same
single numbered division and the same lettered subdivision (i.e.,
d1S1=d2S2), then neither the division nor the subdivision designation will be
repeated, e.g., 32A3-5.
If a location is known to a single
band, then the location will be given as (d1)(S1)(b1) with no hyphen and no
repetition of the band location, e.g., 32A3.
If a location is known to a single
doublet, then the location will be given as (d1)(S1)(b1)-(b1+1) where (b1) and
(b1+1) represent the two succeeding bands of the doublet, e.g., 32A1-2.
If only one end of a location range
is within a doublet, the location will simply refer to the band number maximizing
the range, e.g., 32C1-D5 will be used, not 32C1,2-D5 and 32B4-C2 will be used,
not 32B4-C1,2.
It is sometimes necessary to represent
interbands in data curated by FlyBase. Interbands have the same symbol as the
immediately preceding band, with the suffix symbol +. The interband between
the Bridges' bands 3A4 and 3A5 is, therefore, represented as 3A4+.
5.2. Telomeres.
Telomeres are designated by nAt, where n is a chromosome number, A is the chromosome
arm, and t indicates the telomere:
1Lt
=
the telomere of the left arm of X
1Rt
=
the telomere of the right arm of X
YLt
=
the telomere of the long arm of Y
YSt
=
the telomere of the short arm of Y
2Lt
=
the telomere of the left arm of 2
2Rt
=
the telomere of the right arm of 2
3Lt
=
the telomere of the left arm of 3
3Rt
=
the telomere of the right arm of 3
4Lt
=
the telomere of the left arm of 4
4Rt
=
the telomere of the right arm of 4
If the telomere is of unknown origin, use:
5.3. Centromeres
and centric heterochromatin.
Centromeres are designated as ncen, where n indicates the chromosome, i.e.,1cen,
Ycen, 2cen, 3cen and 4cen.
5.3.1. Centric heterochromatic
blocks will be indicated as hn, where n is a consecutive number.
5.4. Composite
chromosome architecture. The designations of the chromosomes, including
polytene band ranges, heterochromatic blocks and centromeres are:
YLt h1 -- h17 Ycen h18 -- h25 YSt
1Lt 1A1 -- 20F4 h26 -- h32 1cen h33 -- h34 1Rt
2Lt 21A1 -- 40F7 h35 -- h37 h38L 2cen h38R h39 -- h46 41A1 -- 60F5 2Rt
3Lt 61A1 --- 80F9 h47 -- h52 h53L 3cen h53R h54 -- h58 81F1 -- 100F5 3Rt
4Lt h59 -- h61 4cen 101F1 -- 102F8 4Rt
Note that the centromeres of chromosomes
2 and 3 lie within heterochromatic bands h38 and h53 respectively.
Some heterochromatic bands, (h25, h42) are divided into two (h25A, h25B, h42A,
h42B) in some stocks.
5.5. Accuracy
of cytological descriptions. In designating cytological position,
the level of accuracy of the determination should be reflected in the specificity
of the statement.
Some examples should make these distinctions
clear. Note that the polytene subdivision described here, 77B, has 9 bands.
- Case 1 - High level of uncertainty
about subdivision location:
- If the observer thinks that the
location of a rearrangement breakpoint might be in 77B but could also possibly
be in 77A or 77C, then the position should be reported as 77A-C.
- Case 2 - Low level of uncertainty
about subdivision location:
- If the observer's best estimate
is that the true breakpoint position is very likely to be in 77B, then the
observer should report the position as 77B.
- Case 3 - No uncertainty about
subdivision location:
- If the observer is absolutely
certain that the location is within 77B, then the location should be reported
as 77B1-9.
6.
Chromosome aberrations
Chromosome aberrations have names
that consist of a prefix, indicating the class of aberration, an indication
of the chromosome, or chromosomes (or their arms) involved contained within
parentheses and a specific designation which identifies the particular rearrangement.
6.1. General
principles for naming aberrations.
6.1.1. Aberrations
not named after a gene: The suffix (i.e., the component of the name
following the parentheses) should include only letters and digits. There should
be no superscripts or subscripts except for the particular cases of synthetic
inversions with L and R superscripts (see 6.4.4). They should
not contain spaces. The characters ( and ) are only to be used to enclose the
designation of a chromosome or chromosome arm.
6.1.2. Aberrations
named after a gene but not associated with an allele: Here the association with
the gene carries circumstantial information about the aberration's breakpoints.
The suffix should comprise the gene symbol, followed by a hyphen if needed for
clarity, followed by any alphanumeric of the investigator's choosing. There
should be no superscripts.
6.1.3. If a gene
whose symbol appears in an aberration changes its name, e.g., for reasons of
newly-discovered allelism, then this name change is propagated to the aberration(s)
in question. The old name will become a synonym.
6.1.4. Aberrations
named for a specific associated allele: Here the suffix should be exactly the
same as the allele designation, i.e. the gene symbol followed by the
superscripted allele symbol. If the allele designation (either gene or allele
part) changes, that change will be propagated to the aberration.
6.2. Translocations.
6.2.1. Translocations
have the symbol T(n1;n2...)m, where n1, n2 ... indicate the
numbers of the chromosomes involved in the translocation.
When chromosomes are listed within
the parenthetical information of a translocation symbol they are listed in the
order: 1, Y, 2, 3, 4. The numbers of the different chromosomes are
separated by semicolons, with no spaces.
6.2.2. The separable components
of translocations.
Previous conventions for naming such
aneuploid segregants have been difficult to employ and do not contain sufficient
information in the derivative name to permit automated recognition of the relationship
between aneuploid segregant and euploid progenitor.
FlyBase will employ the following
conventions for different classes of euploid chromosomal aberrations and their
aneuploid derivatives.
6.2.2.1. Translocation segregants.
Translocations, standardly named T(n1;n2)m, consist of two or more
translocated chromosomes, each of which can potentially exist as an aneuploid
segregant. Such segregants will be named using telomeres of the rearranged chromosomes
as landmarks for specific segregants. Two-break translocations are often called
reciprocal translocations if two chromosome segments have simply been exchanged.
The general form of the name of a
segregant will be Ts(n1Pt;n2Qt)m. Ts stands for 'Translocation
segregant"' n1Pt and n2Qt for the designation of the landmark
telomere(s) (e.g., 2Lt, 3Rt) and m is the same suffix as the progenitor
translocation from which the segregant is derived.
- Example 1: Two-break reciprocal
translocation. No ambiguity about the locations of either breakpoint relative
to the centromere.
- T(2;3)rg35
(= T(2;3) 27E-F;62C2-D1)
- The two aneuploid segregants are
therefore named:
-
- Ts(2Lt;3Rt)rg35 (=
2Lt-27E|62D1-3Rt)
- Ts(2Rt;3Lt)rg35 (=
2Rt-27F|62C2-3Lt)
- Example 2: Three-break reciprocal
translocation. No ambiguity about the locations of any breakpoint relative
to the centromere.
- T(1;2;3)OR9
(= T(1;2;3)19-20;49F;81F)
- The three aneuploid segregants
are accordingly named:
-
- Ts(1Lt;3Lt)OR9 (= 1Lt-19|81F-3Lt)
- Ts(1Rt;2Rt)OR9 (= 1Rt-20|49F-2Rt)
- Ts(2Lt;3Rt)OR9 (= 2Lt-49F|81F-3Rt)
6.2.2.2. Complex segregants
and recombinants. For many complex translocations or inversions with
four or more breakpoints, multiple aneuploid segregants or recombinants can
potentially occur. It is impossible to invent a naming scheme for these complex
cases that would automatically reveal the specific aneuploid chromosome complement.
In such instances, resulting aneuploids will be given appropriate names as follows:
The first duplication or deletion
is assigned the unique suffix of the parental euploid rearrangement. The new
order of the resulting chromosome must be reported.
Succeeding duplications or deletions
are assigned other unique suffixes. Their new orders must also be reported.
6.3. Rings.
Ring chromosomes have the symbol R(n)m , where n indicates
the number of the chromosome and m is a specific designation.
6.4. Inversions.
6.4.1. Inversions
have the symbol In(nA)m, where n indicates the number of the
chromosome involved, A the arm or arms involved and m is a
specific designator.
In the case of multiple-break intrachromosomal
rearrangements, the distinction between inversions and transpositions often
becomes ambiguous. An intrachromosomal rearrangement that can be partitioned
into a duplicated and a deficient product by exchange with a normal-sequence
chromosome is designated a transposition even though it may carry an inverted
segment; otherwise, it is designated an inversion.
6.4.2. If it is
not known whether or not an inversion is paracentric (does not include the centromere)
or pericentric (includes the centromere) then the indicator of chromosome arm(s)
is omitted, i.e., In(n)m.
6.4.3. By convention,
In(1) implies In(1L).
6.4.4. Recombinant products
between two inversions. Recombination between similar inversions may
produce viable recombinant inversions with the left end of one and the right
end of the other. Superscripts L and R are used to identify
the sources of the two ends; for example; In(2L)CyLtR.
6.5. Transpositions.
Among interchromosomal rearrangements, the term transposition is reserved for
that class in which the telomeres of the chromosomes involved are coupled (that
is to say, form the two ends of a single DNA molecule) as in wild-type. Rearrangments
that alter the pairing of telomeres are classified as translocations.
In the case of multiple-break intrachromosomal
rearrangements, the distinction between inversions and transpositions often
becomes ambiguous. An intrachromosomal rearrangement that can be partitioned
into a duplicated and a deficient product by exchange with a normal-sequence
chromosome is designated a transposition even though it may carry an inverted
segment; otherwise, it is designated an inversion.
6.5.1. Transpositions
have the symbol Tp(n1;n2)m, where n1 is the 'donor' chromosome,
n2 the 'recipient' chromosome and m a specific designation.
For intrachromosomal transpositions n1 = n2.
6.5.2. Separable components
of transpositions.
6.5.2.1. Interchromosomal
transpositions. Segregants of interchromosomal transpositions will
continue to be referred to as in the past. For a transposition with the name
Tp(n1;n2)m, the chromosome segregant containing the duplicated material
will be named Dp(n1;n2)m, and the chromosome containing the deleted
material will be named Df(n1A)m, where A refers to the chromosome
arm of the deletion.
- Example:
Tp(3;1)kar5l (= Tp(3;1)87C7-D1;88E2-3;20)
- The two aneuploid segregants are:
- Dp(3;1)kar5l
(= 1Lt-20|87D1-88E2|20-1Rt)
- Df(3R)kar5l
(= 3Lt-87C7|88E3-3Rt)
6.5.2.2. Intrachromosomal
transpositions. Segregants here are produced by recombination with
a structurally normal chromosome, not by chromosome segregation. For transpositions
in which the transposed segment is in the uninverted orientation relative to
the standard map, there may be two potential duplication and two potential deletion
derivatives (one set resulting from recombination events in the region between
the deficiency and duplication components of the transposition, and one set
resulting from recombination events within the transposed segment). For transpositions
of the type Tp(n1;n1)m, the reported duplication segregant will be
named Dp(n1;n1)m and the new order must be reported to eliminate any
ambiguity. Similarly, the reported deletion recombinant is referred to as Df(n1A)m,
where A refers to the chromosome arm bearing the deletion. In rare
cases in which the alternative duplication or deletion recombinant (generated
by recombination within the transposed segment) is also reported, it will be
given a different suffix from the progenitor transposition and the new order
will be reported.
- Example: Tp(3;3)DlII13
(= Tp(3;3)88F5-9;91A3-8;92A2)
- The primary aneuploid recombinants
would then be:
- Dp(3;3)DlII13
(= 3Lt-92A2|88F9-91A3|92A2-3Rt)
- Df(3R)DlII13
(= 3Lt-88F5|91A8-3Rt)
If subsequently, the other deletion
or duplication recombinant is generated, it will be given a novel suffix, perhaps
completely unrelated to the progenitor, e.g.:
- Df(3R)xxx (= 3Lt-91A3|92A2-3Rt)
- Dp(3;3)xxx
(= 3Lt-88F5|91A8-92A2|88F5-3Rt)
6.6. Deficiencies
(deletions).
Deficiencies (deletions) have the
symbol Df(nA)m, where n is the number of the deleted chromosome,
A is the chromosome arm and m is a specific designator.
Intragenic deletions are not treated
as deficiencies, but as alleles; at least two adjacent loci must be removed
or disrupted before a lesion is considered a deletion.
6.7. Duplications.
Duplications have the symbol Dp(n1;n2)m,
where n1 is the 'donor' chromosome, n2 the recipient and m
a specific designator; n1 may equal n2.
Duplications may be: tandem (in direct
or inverted order), insertional or free. Direct and inverted tandem duplications
are not distinguished by their symbols. Ambiguity must be avoided by explicit
description of the new order (see section 7.1 New order).
6.7.1. When the
duplicated sequences are carried as a free centric element, the letter f
(free) follows the semicolon within the parentheses, replacing n2;
e.g., Dp(1;f)101.
6.7.2. Higher order repeats.
Higher-order repeats are also symbolized Dp, with the number of repeats
indicated in the parenthetical chromosomal designation, i.e., Dp(1;1)
= duplication, Dp(1;1;1) = triplication, and so forth.
6.8. Y derivatives.
In the past many Y chromosome derivatives (e.g., marked- Y
chromosomes) were named in a rather special way, as m1Ym2 , where m1
is a marker (or markers) carried on YL and m2 a marker (or
markers) carried on YS. Such chromosomes should be named as duplications,
following the normal rules. Thus a y+Y is Dp(1;Y
)y+ and Ymal+ is Dp(1;Y)mal+.
6.9. Autosynaptic
elements. A pericentric inversion can be converted to two reciprocal
autosynaptic elements by recombination between the inverted segment and a normal
homolog. For a pericentric of the type In(nLR)m, the two autosynaptic
products are LS( |